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Final  Project  Report 

This  report  provides  a  summary  of  the  applications  and  algorithms  whose  development 
was  enabled  by  the  acquisition  of  the  equipment  allowed  by  the  grant.  We  also  outline  the 
impact  of  this  parallel  computing  resource  on  our  future  research  roadmap. 

Computational  Fluid  Dynamics 

One  of  the  most  important  advances  enabled  by  the  availability  of  our  parallel  cluster 
resource  was  the  development  of  parallel  codes  for  Computational  Fluid  Dynamics 
simulation.  This  allowed  for  the  resolution  of  phenomena  such  as  the  interaction  of 
multiple  immiscible  or  chemically  reacting  fluid  phases.  In  [LSSF]  we  built  on  the 
particle  level  set  method  [EFFM]  while 
extending  it  to  the  simulation  of  multiple 
fluids  on  the  same  simulation  grid. 

Approaches  such  as  the  Ghost  Fluid 
Method  have  been  successfully 
employed  to  resolve  the  difficulties  of 
integrating  the  Navier-Stokes  equations 
in  the  presence  of  discontinuities  in  the 
state  variables  occurring  across  region 
boundaries  corresponding  to  distinct 
phases,  such  as  water  and  air  or  liquid 
fuel  and  gaseous  combustion  products. 

However,  additional  complications  occur 
when  more  than  two  phases  interact  in  ways  that  may  give  rise  to  triple  points,  where 
preventing  the  formation  of  vacuum  or  material  overlap  is  particularly  challenging.  The 
work  of  [LSSF]  (see  also  [HSKF])  proposes  a  method  for  applying  jump  conditions 
across  discontinuities  without  the  explicit  need  for  ghost  cell  storage,  computing  them  on 
the  fly  instead.  Additionally  it  proposes  a  novel  projection  method  that  eliminates  the 
problems  of  vacuum  formation  or  material  overlap  while  preserving  the  signed  distance 
property  for  the  levelset  functions  providing  the  representation  of  the  fluid  volume.  Such 
advances  were  particularly  important  in  improving  the  efficiency  of  partitioning  and 
communication  schemes  required  for  the  mapping  of  CFD  computations  across  a  grid  of 
processors,  while  reducing  the  parallelization  overhead.  In  conjunction  with  the  use  of 
grid-conscious  preconditioning  schemes  this  allowed  for  the  simulation  of  complex 
scenarios  such  as  the  interplay  of  fluid  phases  of  varying  viscosity,  density  and 
viscoelastic  behavior  at  uniform  resolutions  never  before  feasible  on  individual  CPUs 
after  mapping  to  several  parallel  nodes  (typically  as  many  as  24-32). 

Beyond  the  mapping  of  general  uniform  grid  discretizations  to  a  parallel  array  of 
computing  nodes,  we  investigated  discretization  methods  that  are  explicitly  optimized  for 


simulation  on  parallel  hardware.  Adaptive 
discretizations  such  as  octree  structures 
[LGF]  map  well  to  single  CPU  platforms 
or  shared  memory  systems.  Clustered 
hardware  presents  a  preference  for 
uniform  grids  which  minimize  and 
simplify  the  geometry  of  partition 
boundaries,  across  which  information 
needs  to  be  propagated.  In  [IGLF]  we 
proposed  a  hybrid  approach  that  combines 
the  uniform  structure  of  a  2D  Cartesian 
grid  with  the  compactness  of 
representation  offered  by  a  Run  Length 
Encoding  scheme.  We  hybridize  the  two  discretizations  by  using  the  the  uniform  grid 
along  the  horizontal  dimensions  and  compress  the  vertical  dimension  using  the  RLE 
scheme.  A  number  of  uniform  cells  are  maintained  aroimd  the  air-water  interface  to 
resolve  the  detail  of  flow  near  the  surface.  We  use  this  hybrid  grid  to  discretize  the  full 
Navier-Stokes  equations  instead  of  resorting  to  a  deep  water  or  shallow  water 
approximation  to  reduce  our  problem  to  two  dimensions.  Mapping  to  a  horizontal  grid  of 
processors  is  straightforward  since  the  partition  boundaries  are  planar  surfaces.  We  have 
successfully  simulated  grids  at  a  resolution  as  high  as  2000x200  for  the  horizontal 
component  of  the  grid,  after  mapping  to  a  linear  array  of  16  processors. 

Continuum  mechanics 

We  have  successfully  applied  Finite  Element  formulations  to  create  anatomically  and 
biomechanically  accurate  simulations  of  the  human  musculoskeletal  system  [TSBNLF] 
as  well  as  facial  expessions  [SSRF].  Both  cases  exemplify  the  importance  of  efficient  and 
scalable  simulation  fi-ameworks  for  deformable  continua,  since  they  both  demand  use  of 
simulation  meshes  in  excess  of  one  million  tetrahedral  elements.  In  [SSRF]  we  exploited 
the  time  independence  property  of  quasistatic  simulation  to  partition  a  facial  expression 
analysis  and  simulation  task  in  time,  before  dispatching  it  to  more  than  40  computing 
nodes  for  parallel  batch  processing,  to  obtain  a  full  muscle-based  description  of  the 
phonemic  spectrum  of  a  subject.  Such  analyses  may  be  used  in  speech  synthesis, 
prediction  of  impacts  of  surgical  corrections  to  facial  motion  and  speech  articulation  and 
virtual  surgery  planning.  We  are  currently  working  on  a  fully  parallel  Finite  Element 
simulation  framework  using  a  spatial,  rather  than  temporal  partitioning  which  will  enable 
the  parallelization  of  time-dependent 
integration  schemes  such  as  the  semi-implicit 
Newmark  used  in  [BFA,ITF,TSBNLF]. 

Mapping  complex  continuum  mechanics 
simulations  to  clustered  hardware  will  enable 
resolutions  of  several  million  elements, 
needed  for  full-body  active  musculature 
simulation  and  enable  applications  such  as 
facial  reconstructive  surgery  planning  to 


operate  at  near-interactive  rates,  dramatically  increasing  their  impact  and  usability  in 
medical  environments.  Finally,  we  seek  to  extend  methods  for  dynamically  changing 
solid  topology  [MBF,BHTF]  to  operate  in  a  parallel  simulation  environment  to  enable 
large-scale  simulations  of  material  damage  resulting  from  impact  or  model  the  process  of 
tissue  cutting  and  suturing  during  a  simulated  virtual  surgery. 
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FINAL  EQUIPMENT  LIST 


ONR  GRANT  NUMBER  N-0001 4-05-1 -0479 
STANFORD  SPONSORED  PROJECTS  NUMBER  32336 
A  COMPUTING  CLUSTER  FOR  NUMERIC  SIMULATION 


Type  of  Equipment 

Manufacturer  and  Model  Number 

Quantity 

Cost 

Compute  Servers 

Sun  Fire  V40z  compute  nodes 

Sun  Fire  V20z  compute  nodes 

20 

$391,853 

Rack/Cabinets 

Sun  Rack  1000-38 

10 

4,195 

Cables  and  switches 

Nortel  Baystack  Switches 

18 

4,220 

Backup  system  for  20- 
node  cluster 

Sun  StorEdge  351 1  Rack  Ready 

1 

13,170 

Additional  compute  node 

Sun  Fire  X4100 

1 

8,995 

TOTAL  COST 


$422,433 


